Shannon information and self-similarity in whole genomes
نویسندگان
چکیده
The Shannon information (SI) in distributions of occurrence frequency of short words in whole genomes is shown to exhibit universality. For given word length, the SI in genomes of all lengths is the same as that in random sequences of a universal lengths Lr . For the shorter words Lr is far shorter than the genome. For example, Lr ∼ 1000 bases for three-letter words. We further show that whole genomes are highly self-similar in the sense that any segment of the genome down to a length of Λsim, about twice Lr , also shares the universal property. We devise a simple genome growth model in which genome-size sequences grown by maximally stochastic segmental duplication and random mutation possess the universal and self-similar properties of genomes. 2005 Elsevier B.V. All rights reserved. PACS: 87.10.+e; 89.70.+c; 87.14.Gg; 87.23.Kg; 02.50.-r
منابع مشابه
Self-similarity in complete genomes
Recently it was reported that in terms of the global feature of frequency distributions of short words, whole genomes are equivalent to random sequences of a much shorter length which, for given word length, is genome independent, or universal. For two-letter words the universal equivalent random-sequence length was found to be about 300 bases. Here we show that as a rule whole genomes are high...
متن کاملA New Model for Best Customer Segment Selection Using Fuzzy TOPSIS Based on Shannon Entropy
In today’s competitive market, for a business firm to win higher profit among its rivals, it is of necessity to evaluate, and rank its potential customer segments to improve its Customer Relationship Management (CRM). This brings the importance of having more efficient decision making methods considering the current fast growing information era. These decisions usually involve several criteria,...
متن کاملUncertainty Modeling of a Group Tourism Recommendation System Based on Pearson Similarity Criteria, Bayesian Network and Self-Organizing Map Clustering Algorithm
Group tourism is one of the most important tasks in tourist recommender systems. These systems, despite of the potential contradictions among the group's tastes, seek to provide joint suggestions to all members of the group, and propose recommendations that would allow the satisfaction of a group of users rather than individual user satisfaction. Another issue that has received less attention i...
متن کاملPredicting CpG Islands and Their Relationship with Genomic Feature in Cattle by Hidden Markov Model Algorithm
Cattle supply an important source of nutrition for humans in the world. CpG islands (CGIs) are very important and useful, as they carry functionally relevant epigenetic loci for whole genome studies. As a matter of fact, there have been no formal analyses of CGIs at the DNA sequence level in cattle genomes and therefore this study was carried out to fill the gap. We used hidden markov model alg...
متن کاملUniversal Lengths in Microbial Genomes and Implication for Early Genome Growth
We report the discovery of a set of universal lengths that characterize all microbial complete genomes. The Shannon information [Shannon 1948] of 108 complete microbial genomes relative to those of their respective randomized counterparts are computed and the results are summarized in a two-parameter exponential relation: Lr(k) = (42± 21)× 2.64, 2 ≥ k ≥ 10, where Lr is a ”root-sequence length” ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Physics Communications
دوره 169 شماره
صفحات -
تاریخ انتشار 2005